/*Industry classification codes - 2022 december*/

*Correspondence between nomenclatures: EU and SCB have developed statistical nomenclatures, which has led to numerous structural changes. The most recent one is SNI 2007 (European equivalent is NACE 2), the previous systems are SNI 2002 (NACE 1.1), SNI 92 (NACE), and SNI 69. The SNI 2007 is the most detailed system, while the preceeding ones are coarses. Therefore, there is no 1:1 correspondence between them. 
clear all

*cd "/RFS_replication_package/Martinsson_et_al_datasets"


******************
*SNI 92- SNI 2007*
******************

*These excel files have already been saved as stata(.dta files)
import excel using "nyckeln20231_sni69_sni92_sni2002_sni2007.xlsx", clear firstrow sheet(Sheet1) /*The Excel file was formed based on merging Statistics Sweden's keys between each system. SCB provides matching in both directions: matching the older system with more recent ones and the other way round. The keys for the correspondence table below goes from SNI92 to SNI2007 as we will need to match SNI92 industry codes with SNI2007 alternatives.*/ 

rename SNI92_1 sni92
rename SNI2007_1 sni2007

keep sni92 sni2007

g sni2007_2d=substr(sni2007,1,2)  /*The idea is to keep the most frequent 2-digit and then corresponding 5-digit SNI codes*/
destring sni2007_2d, replace
sort sni92 sni2007_2d, stable

quietly by sni92 sni2007_2d: gen dup=cond(_N==1,0,_n)
egen dup_max=max(dup), by(sni92)

keep if dup==dup_max   /*We keep those the most frequent 2-digit SNI2007 codes, per SNI92 code*/
drop dup

sort sni92 sni2007, stable
quietly by sni92: gen dup=cond(_N==1,0,_n)
drop if dup>1
 
drop dup dup_max sni2007_2d
rename sni2007 sni2007_1

 
save correspondence_92_2007_20231014.dta, replace

******************
*SNI 2002- SNI 2007*
******************
*We can do this very similary to SNI 2007 and SNI 2002


import excel using "nyckeln20231_sni69_sni92_sni2002_sni2007.xlsx", clear firstrow sheet(Sheet2)  /*The Excel file was formed based on merging Statistics Sweden's keys between each system. SCB provides matching in both directions: matching the older system with more recent ones and the other way round. The keys for the correspondence table below goes from SNI2002 to SNI2007 as we will need to match SNI2002 industry codes with SNI2007 alternatives.*/ 

rename SNI2002_1 sni2002
rename SNI2007_1 sni2007

keep sni2002 sni2007

g sni2007_2d=substr(sni2007,1,2)  /*The idea is to keep the most frequent 2-digit and then corresponding 5-digit SNI 2007 codes per SNI 2002 code*/
destring sni2007_2d, replace
sort sni2002 sni2007_2d, stable

quietly by sni2002 sni2007_2d: gen dup=cond(_N==1,0,_n)
egen dup_max=max(dup), by(sni2002)

keep if dup==dup_max   /*We keep those the most frequent 2-digit SNI2007 codes, per SNI92 code*/
drop dup

sort sni2002 sni2007, stable
quietly by sni2002: gen dup=cond(_N==1,0,_n)
drop if dup>1
 
drop dup dup_max sni2007_2d

save correspondence_2002_2007_20231014.dta, replace


******************
*SNI 92- SNI 2002*
******************

import excel using "nyckeln20231_sni69_sni92_sni2002_sni2007.xlsx", clear firstrow sheet(Sheet1)/*The Excel file was formed based on merging Statistics Sweden's keys between each system. SCB provides matching in both directions: matching the older system with more recent ones and the other way round. The keys for the correspondence table below goes from SNI92 to SNI2002 as we will need to match SNI92 industry codes with SNI2002 alternatives.*/ 

rename SNI2002_1 sni2002
rename SNI92_1 sni92

keep sni2002 sni92 

g sni2002_2d=substr(sni2002,1,2)  /*The idea is to keep the most frequent 2-digit and then corresponding 5-digit SNI 2002 codes per SNI 92 code*/
destring sni2002_2d, replace
sort sni92 sni2002_2d, stable

quietly by sni92 sni2002_2d: gen dup=cond(_N==1,0,_n)
egen dup_max=max(dup), by(sni92)

keep if dup==dup_max   /*We keep those the most frequent 2-digit SNI2007 codes, per SNI92 code*/
drop dup

sort sni92 sni2002, stable
quietly by sni92: gen dup=cond(_N==1,0,_n)
drop if dup>1
 
drop dup dup_max sni2002_2d

save correspondence_92_2002_20231014.dta, replace

******************
*SNI 69- SNI 2007*
******************

import excel using "nyckeln20232d_sni69_sni92_sni2002_sni2007.xlsx", clear firstrow sheet(Sheet2) /*The Excel file was formed based on merging Statistics Sweden's keys between each system. SCB provides matching in both directions: matching the older system with more recent ones and the other way round. The keys for the correspondence table below goes from SNI69 to SNI2007 as we will need to match SNI69 industry codes with SNI2007 alternatives.*/ 

rename SNI2007_1 sni2007
rename SNI69_1 sni69

keep sni2007 sni69 

g sni2007_2d=substr(sni2007,1,2)  /*The idea is to keep the most frequent 2-digit and then corresponding 5-digit SNI 2007 codes per SNI 69 code*/
destring sni2007_2d, replace
sort sni69 sni2007_2d, stable

quietly by sni69 sni2007_2d: gen dup=cond(_N==1,0,_n)
egen dup_max=max(dup), by(sni69)

keep if dup==dup_max   /*We keep those the most frequent 2-digit SNI2007 codes, per SNI69 code*/

drop dup

sort sni69 sni2007, stable
quietly by sni69: gen dup=cond(_N==1,0,_n)
drop if dup>1
 
drop dup dup_max sni2007_2d
tostring sni69, replace

save correspondence_69_2007_20231014.dta, replace



******************
*SNI 69- SNI 2007-2d*
******************

import excel using "nyckeln20232d_sni69_sni92_sni2002_sni2007.xlsx", clear firstrow sheet(Sheet2) /*The Excel file was formed based on merging Statistics Sweden's keys between each system. SCB provides matching in both directions: matching the older system with more recent ones and the other way round. The keys for the correspondence table below goes from SNI2007 to SNI69 as we will need to match SNI2007 industry codes with SNI69 alternatives.*/ 

rename SNI2007_1 rbsni
rename SNI69_1 sni69

keep rbsni sni69 
sort sni69 rbsni, stable


duplicates drop rbsni, force
tostring sni69, replace

save correspondence_2dsni69_sni2007_20231014.dta, replace



